Semi-supervised Learning for Automatic Prosodic Event Detection Using Co-training Algorithm
نویسندگان
چکیده
Most of previous approaches to automatic prosodic event detection are based on supervised learning, relying on the availability of a corpus that is annotated with the prosodic labels of interest in order to train the classification models. However, creating such resources is an expensive and time-consuming task. In this paper, we exploit semi-supervised learning with the co-training algorithm for automatic detection of coarse level representation of prosodic events such as pitch accents, intonational phrase boundaries, and break indices. We propose a confidence-based method to assign labels to unlabeled data and demonstrate improved results using this method compared to the widely used agreement-based method. In addition, we examine various informative sample selection methods. In our experiments on the Boston University radio news corpus, using only a small amount of the labeled data as the initial training set, our proposed labeling method combined with most confidence sample selection can effectively use unlabeled data to improve performance and finally reach performance closer to that of the supervised method using all the
منابع مشابه
Semi-supervised extractive speech summarization via co-training algorithm
Supervised methods for extractive speech summarization require a large training set. Summary annotation is often expensive and time consuming. In this paper, we exploit semi-supervised approaches to leverage unlabeled data. In particular, we investigate co-training for the task of extractive meeting summarization. Compared with text summarization, speech summarization task has its unique charac...
متن کاملUnsupervised Prosodic Break Detection in Mandarin Speech
We propose that, in Mandarin speech, an automatic prosodic break detector can be trained without any prosodically labeled training data. We use only lexical and acoustic cues to create a small labeled training set, then use semi-supervised learning to train a prosodic break detector. A generative mixture model is proposed as the learning algorithm that can learn with both labeled and unlabeled ...
متن کاملCo-training using prosodic and lexical information for sentence segmentation
We investigate the application of the co-training learning algorithm on the sentence boundary classification problem by using lexical and prosodic information. Co-training is a semisupervised machine learning algorithm that uses multiple weak classifiers with a relatively small amount of labeled data and incrementally uses unlabeled data. The assumption in cotraining is that the classifiers can...
متن کاملمقایسه روشهای مختلف یادگیری ماشین در خلاصهسازی استخراجی گفتار به گفتار فارسی بدون استفاده از رونوشت
In this paper, extractive speech summarization using different machine learning algorithms was investigated. The task of Speech summarization deals with extracting important and salient segments from speech in order to access, search, extract and browse speech files easier and in a less costly manner. In this paper, a new method for speech summarization without using automatic speech recognitio...
متن کاملEmotion Detection in Persian Text; A Machine Learning Model
This study aimed to develop a computational model for recognition of emotion in Persian text as a supervised machine learning problem. We considered Pluthchik emotion model as supervised learning criteria and Support Vector Machine (SVM) as baseline classifier. We also used NRC lexicon and contextual features as training data and components of the model. One hundred selected texts including pol...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009